Single-cell transcriptome profiling of the stepwise progression of head and neck cancer

Head and neck squamous cell carcinoma (HNSCC) undergoes stepwise progression from normal tissues to precancerous leukoplakia, primary HNSCC, and metastasized tumors. To delineate the heterogeneity of tumor cells and their interactions during the progression of HNSCC, we employ single-cell RNA-seq profiling for normal to metastasized tumors. We can identify the carcinoma in situ cells in leukoplakia lesions that are not detected by pathological examination. In addition, we identify the cell type subsets of the Galectin 7B (LGALS7B)-expressing malignant cells and CXCL8-expressing fibroblasts, demonstrating that their abundance in tumor tissue is associated with unfavorable prognostic outcomes. We also demonstrate the interdependent ligand-receptor interaction of COL1A1 and CD44 between fibroblasts and malignant cells, facilitating HNSCC progression. Furthermore, we report that the regulatory T cells in leukoplakia and HNSCC tissues express LAIR2, providing a favorable environment for tumor growth. Taken together, our results update the pathobiological insights into cell-cell interactions during the stepwise progression of HNSCCs.

addition, sample-specific CNAs of the epithelial cells were determined with a cutoff of a cell proportion with CNA greater than 5 %. Then, we defined the recurrent CNAs as the genes observed in at least two samples with overlap between the tissue-specific and the sample-specific CNAs.

Interdependent ligand-receptor interaction analysis
Data for putative ligand-receptor interactions were obtained from a previous study 4 . First, the average expression levels for each ligand-receptor pair were estimated in the malignant cells and the fibroblasts across the tissue types of NL, LP, and CA/LN, respectively. Then, to determine the interdependent ligand-receptor pairs between CAFs and malignant cells, the ligand-receptor pairs were further filtered by the following criteria. (1) The ligands were expressed, but their receptors were not expressed in CAF (ligand vs. receptor, fold difference of average expression > 0. 5). (2) The receptors were expressed, but their ligands were not expressed in the malignant cells (receptor vs. ligand, fold difference of average expression > 0. 5). (3) The ligands were expressed in LP than NL (LP vs. NL, fold difference of average expression > 0.1). Because the basal expression levels of the ligands and the receptors in tumor tissues (CA and LN) were substantially higher than those in non-tumor tissues (NL and LP), we applied different cutoffs of the fold difference for each cell type.

Pseudo-time trajectory analysis
Pseudo-time trajectory analyses were performed using Monocle v2 in the R package 5 . UMI count matrices were used to create a CellDataSet object with default parameters. The T cell clusters were identified using the 'FindVariableFeatures' function of the Seurat R package. Gene expression changes according to the pseudo-times were evaluated using the differentialGeneTest function. To construct UMAP-based single-cell state transitions, we applied single-cell trajectory analysis for fibroblasts using the Monocle v3 6 R package. Dimension reduction and cell clustering were performed using the 'reduce_dimension,' and 'cluster_cell' (k = 50) functions.

Public data analysis
Public data sets for scRNA-Seq of HNSCC patients (GSE103322 and GSE164690) were obtained from Gene Expression Omnibus. After data scaling for malignant cells, the malignant clusters, CC0-CC5, were predicted using the classifiers by applying the nearest template prediction (NTP) algorithm.

Immunohistochemistry
Immunohistochemistry was performed to analyze the expression of galectin-7 (Abcam, ab206435, 1:100). Paraffin tissue sections were briefly deparaffinized with xylene and rehydrated through alcohol and washed in distilled water. After H2O2-induced inactivation for endogenous peroxidase activity and antigen retrieval in pepsin (Dako, Carpinteria, CA) at 37 °C for 30 min, the tissue sections were incubated with the primary antibody for LGALS7B overnight at 4 °C. After washing, signals were detected with 3,3-diaminobenzidine tetrahydrochloride, and the sections were counterstained with hematoxylin.
Immunofluorescent images were taken from each slide using LSM 710 confocal microscope (Carl Zeiss).

Cell viability, migration, and invasion assays
Cells were seeded at a density of 3 X 10 3 cells per well in 96-well plates. At 24 h post-seeding, cells were transfected with siRNA at various times. Cell viability was measured by using the Cell Counting Kit (CCK-8; Dojindo, Tokyo, Japan) according to the manufacturer's instructions. Each experiment was repeated in triplicate.
Cell migration and invasion assays were performed in a 6.5 mm insert with 8.0 μm polyethylene terephthalate membranes in 24-well plates (SPLInsert TM Hanging, SPL Life Sciences). For the migration assay, cells were transfected with siRNAs targeting TP63, ATP1B3, or negative control. After 24 h, the transfected cells (7 × 10 4 ) were seeded in a top chamber of 150 μl of serum-free medium. The culture medium with 10% FBS (800 μl) was added to the bottom chamber, and the cells were incubated for 18 h. For cell invasion assay, the upper surface of the membrane was pre-coated with 10 μg/ml fibronectin (Sigma Aldrich, F1141) in PBS overnight at 4°C. 1 × 10 5 cells/well were seeded on top of the chamber of 150 μl of serum-free medium. The culture medium with 10% FBS (800 μl) was added to the bottom chamber, and the cells were incubated for 24 h. Then, the migrated or invaded cells were fixed with 4% paraformaldehyde phosphate buffer solution (Wako Pure Chemical Industries, 163-20145) for 5 min and stained with 0.1% crystal violet (Sigma Aldrich, C0775) in 20 % methanol solution for 15 min. Non-migrated cells were removed with a cotton swab, and then the numbers of migrated or invaded cells were counted in the random fields under EVOS M7000 Cell Imaging Systems (x200 magnification, Thermo Fisher Scientific). The bound crystal violet was eluted by adding 400 μl of 10 % acetic acid solution (Sigma Aldrich, 45754) into each insert and shaken for 10 min. The eluent on inverted transwells was transferred to a 96-well microplate, and the absorbance at 590 nm was measured using an Epoch Microplate Spectrophotometer (BioTek Instruments). Both experiments were repeated in triplicate, independently.
Co-culture of MSKQLL1 cells and the patient-derived CAFs were implemented using the indirectly co-culture transwell system [6.5 mm insert with 8.0 μm polyethylene terephthalate membranes in 24well plates, SPLInsert TM Hanging, SPL Life Sciences]. The cells were transfected with siCD44 or siCOL1A1, respectively. 1 × 10 5 of the malignant cells were seeded in the top chamber with serumfree medium (150 μl), and 1 × 10 5 of the patient-derived CAFs were seeded in the bottom chamber with culture medium (800 μl, 10% FBS) and incubated for 18 h.

Sphere formation assays
FaDu cells were transfected with 50 nM of the siRNAs targeting TP63, ATP1B3, or negative control.
After 24 h transfection, the single-cell suspension of the transfected cells (3 × 10 3 cells/35 μl/drop) in a complete culture medium was placed on the inner side of a 96-well hanging drop plate (SPL Life Sciences). After 48 h incubation, the spheroids were harvested from the hanging drop plate by pipetting 100 μl of PBS. The spheroids were then washed with PBS and stained using the LIVE/DEADTM Viability/Cytotoxicity Kit (Thermo Fisher Scientific, L3224) by incubating with calcein AM and Ethidium homodimer-1 (EthD-1) at 37°C for 30 min. A confocal microscope (Nikon A1R, Japan, x20 objective, n = 8) was used for imaging, and z-section imaging was performed.
Proteins were visualized using ECL reagents (GE Healthcare Life Sciences, RPN2235) and detected with ImageQuant™ LAS 4000 (FujiFilm, Tokyo, Japan). Antibodies were obtained from several sources,

RT-PCR
Total RNA was extracted using TRIzol reagent (Thermo Fisher Scientific, 15596026). cDNAs Cycling T score correlated with TIL score (left) and dysfunctional T score (right) in TCGA-HNSCC data.
The gray shading represents 95% confidence interval (CI). Cycling T cell, TIL, and dysfunctional T cell scores are estimated as mean expression values of their marker genes, respectively.

Supplementary Tables
Supplementary Table 1